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Mining First-order Knowledge Bases for Association Rules - Jamil (CoTrect). 

...complexity of even the best known methods remains high. While several ecient algorithms for 
association rule mining have been proposed [1, 6, 20, 15, 9, 19, 14, 21] overall eciency is still a 
major issue, specially for other kinds of rule induction such as ratio rules [8] and chi square rules 

14]. While many forms of rule inductions are interesting, association rules were found to be appealing 
because of their simplicity and intuitiveness. In this paradigm, the rule mining process is divided into two 
distinct steps discovering frequent item sets and generating rules. There are .... 

....As future research, we plan to develop optimization techniques for mining queries that require non 
trivial look ahead and pruning techniques in aggregate functions. The developments presented here also 
have other signi cant implications. For example, it is now possible to compute chi square rules [4] 
using the building blocks provided by our system. Declarative computation of chi square rules, to our 
knowledge, has never been attempted for the many procedural concepts the computation of chi square 
method relies on. In a separate work [2] we show that the counting method proposed in this paper .... 

Sergey Brio, Rajeev Motwani and. Craig Silverstein. Beyond fijtuMti baskets: Generalizing association 
rules to correlations, la Proc. ACM SIGMQD, pages 265 •[ 276, 1997, 



Association Rule Mining on Remotely Sensed Imagery Using P-Trees - Ding (2002) (1 citation) 

(Correct.) 

....first step, which is the generation of frequent itemsets [AS 94] Having determined the frequent 
itemsets, the s econd step is very straightforward and provides few possibilities for improvement. The 
reason is that BMHWlffl does not have any closure property while support has a downward 
property [BMS97]. By downward property, we mean that, if a set has a property, then all its subsets 
also have this property. Support is downward closed because of the fact that, if a set of items satisfies 
the minimum support, then all its subsets also satisfy the minimum support. The downward closure 
property.... 

....study of interestingne ssmeasures for association rule patterns is given in [TK00] There are some 
critiques of the support flMffifSffroS framework because this framework does not address some problems 
such as negative implications. In addition, it may lead to misleading rules in some situations 
[BMS97, SBM98]. Table 2.3 gives a tea coffee example in the contingency table, where u represents the 
presence of an item and # its absence, and the numbers represent percentages of purchase. Table 2.3. 
Contingency table of tea and coffee purchase TEA TEA row sum 20 70 55 90 10 column sum 25 75 100 
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....(a) e) 50 100 No (a, b) d) 25 100 Yes (a, b, c) d) 25 100 Yes (f) d) 0 0 No (y) d) 100 100 Yes (B) 
Examples of Association Rules Figure 1. Transactions and Association Rules The revision of the 
Apriori algorithm adopting the chi squared test has been investigated j4). This method suffers from 
generating too many uncorrelated rules be cause it still uses the support threshold [4] S. Mor ishita 
suggested a scalable statistical pruning method by computing an upper bound of a statistical metric such 
as chi squared value, but the upper bound of the statistical .... 

....100 Yes (B) Examples of Association Rules Figure 1. Transactions and Association Rules The 
revision of the Apriori algorithm adopting the chi squared test has been investigated [4] This 
method suffers from generating too many uncorrelated rules be cause it still uses the support 
threshold (4j. S. Mor ishita suggested a scalable statistical pruning method by computing an upper 
bound of a statistical metric such as chi squared value, but the upper bound of the statistical metric is only 
valid for binary feature set [15] Correlation techniques have the following limita tions: They .... 

S. Brio., R. Motwani. and C. Silverstdn. Beyond market baskets: generalizing association mles to 
correlations. In Proc A CM SIGMOD Interna- tional Conference on Management of Data, pages 
265-276, Tucson, Arizona, 1997. 



Using Association Rules for Product Assortment.. - Briis, Swinnen.. (1999) (12 citations) (Con;ect) 

.... cigarette paper [absolute sup = 291, conf = 0. 82] These rules demon strate that whenever a 
customer buys cigarette paper, he she also buys tobacco rtWTHfflFBTffl = 100 and that when a 
customer buys tobacco he will often also buy cigarette paper with it ^f5g1>yi^>r^ lfSnfy^3 82 ) A more 
formal method |9) to assess the dependence between two or more products is interest. Definition 5: 
Interest s (X Y) s (X) s (Y) The nominator s (X Y) measures the observed S^l^Hl of the co 
occurrence of the items in the antecedent (X) and the consequent (Y) of the rule. The denominator s (X) 
s(Y).... 

Bria. S. } MofwariL R... md Silverstein, C Beyond market baskets: generalizing association rules to 
correlations. In Peckhairu l t fed.). J^ocecJinscs of J he ACM SJGMOD Conference on Management of 
Data, 1997 (SIGMOD s 97), 265-276, 



Levelwise Search of Frequent Patterns with Counting.. - Bastide, TaouiL .(Conrgct). 

. .has been conducted on this topic. The problem of mining frequent patterns arose rst as a sub 
problem of mining association rules, but then it turned out that frequent patterns solve a variety 
of problems: mining sequential patterns [AS95] episodes [MTV97] association rules [AS94] 
correlations [BMS97, SBM98], multi dimensional patterns [KHC97, LSW97] maximal patterns 
[ZPOL97, LK98] and several other important knowledge discovery tasks [HPYOO] Since the 
complexity of this problem is exponential in the size of the binary database input relation and 
since this relation has to be scanned several times . .. 

S. Brin. R. Mbtwam, mdC. Silvmtem, Beyond ma$& baskets: Generalizing association rules to 
correlation. Proc. SJGMOD conf, pp 265-276, May 1 997. 
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....many work has been conducted on this topic. The problem of mining frequent patterns arose rst as 
a sub problem of mining association rules [1] but it then turned out to be present in a variety of 
problems [18] mining sequential patterns [3] episodes [26] association rules [2] correlations [10, 
37], multi dimensional patterns [21, 22] maximal patterns [8, 53, 23] closed patterns [47, 31 33] 
Since the complexity of this problem is exponential in the size of the binary database input relation 
and since this relation has to be scanned several times during the process, ecient algorithms for .... 

S. Brin, It Moivvani and C. Siivcrstein. Bwowfmm&et baskets: G&ieraiizing association rules to 
correlation. In Proo. ACM SIGMOD Intl Con£ on' Management of Data, pages 265-276, May 1997. 
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....in association mining is developing parallel mining algo rithms for finding association rules [12] 21] 
Other resea rchers are concerned vith different issues; one recent debate is the appropriateness of using 
WfHfllll m HW'ffiW to assess relationship or association. Brin, Motvani and Silverstein in J 9] suggested that 
the dependence ra tio or corre lation betveen tvo sets are more appropriate to calculate 
relationships than raftffTfflfiiflTgS . The algorithm they proposed involved merging the frequent set 
generation and correlation calculation algorithms into one to increase the pruning pover of the .... 



....value in the output. 52 T is true T is false Sum of Row S is true sltl sltO si S is false sOtl sOtO sO Sum 
of Column tl tO n Table 3.3: Contingency Table for Association Rules 3.6. 2 C orrelation S ome 
researchers believe that correlation is a better measure of association than BifHTlfffljflfffl [9]. Gi ven 
this, the prototype allows the user to choose correlation as a relationship metric versus RfffiiFH "tWffi . We 
believe that giving the user this choice makes the new exploratory model more flexible, redefining the 
definition of rela tionship in association mining to be any metric one sees fit .... 
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Parallel Formulations of Tree-Projection-Based Sequence.. - Guralnik, Karypis (Correct) 

....those based on the [SB wise paradigm; nevertheless, they still require a substantial amount of time. A 
number of efficient and scalable parallel formulations have been developed for finding frequent 
itemsets and sequences that are based on the candidate generation and counting framework [3, 18, 
22, 16, 4], both for shared and distributed memory parallel computers [2, 22, 17, 8, 25, 29, 20] 
However, the problem of parallelizing equivalence class based and projection based algorithms has 
received relatively little attention and existing parallel formulations for them have been targeted 
only toward . .. 

....[27] algorithm extended the Apriori like jff|5f= wise mining method to find frequent patterns in 
sequential datasets. The basic [^Hj wise algorithm has been extended in a number of different ways 
leading to more efficient algorithms such as DHP [19, 18] Partition [22] SEAR and Spear [16] and 
DIC [4]. An entirely different approach for finding frequent itemsets and sequences are the equivalence 
class based algorithms Eclat [32] and SPADE [31] that break the large search space of frequent patterns 
into small and independent chunks and use a vertical database format that allows them to .... 
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Exploratory Mining via Constrained Frequent Set Queries - Ng, Lakshmanan, Hah, Mah (1999) 
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....exceed given thresholds. While such associations are useful, other notio ns of relationships may also be 
useful. First, there exist several significance metrics other than ffmTmEHfW that are equally meaningful. 
For example, Brin et al. argue why correlation can be more useful in many circumstances [2]. 
Second, there may be separate criteria for selecting candidates for the antecedent and consequent of a 
rule. For example, the user may want to find associations from sets of items to sets of types. Coming 
from different domains, the antecedent and consequent may call for different support .... 

S. Brio.. R. Motwani, and C. Si! versidn. Beyond market Omcralizing association rules to 
correlations. h\ Proc. 1997 ACM-SIGMDD. pp 265-276. 

Finding Frequent Patterns Using Length-Decreasing Support.. - Seno, Karypis (Correct) 

....algorithm extended the Apriori like =BBj wise mining method to find frequent patterns in sequential 
databases. The basic [fflll wise algorithm has been extended in a number of different ways leading 
to more efficient algorithms such as DHP [14, 13] Partition [19] SEAR and Spear [12] and DIC 
[5]. An entirely different approach for finding frequent itemsets and sequences are the equivalence class 
based algorithms Eclat [26] and SPADE [24] that break the large search space of frequent patterns into 
small and independent chunks and use a vertical database format that allows them to .... 
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STAMP: On Discovery of Statistically Important Pattern.. - Yang, Wang, Yu (Correct) 

....was not fully taken into account by the multiple support model . In contrast, the generalized 
information gain metric would capture the difference of occurrences between B and C. 5.2. 3 
Statistically Significant Patterns There are much work in discovering statistically significant 
patterns [5, 18, 27]. All those work only takes into account the occurrence of a pattern in a sequence or 
a transaction. However, it does not assign any penalty if a pattern fails to be present when it is supposed 
to. In addition, all those work only discovers the significant patterns for the entire data set, and .... 

S. Brin, R. Mofw&ni, C. Siiversiem. Beyo) id market baskets: generalizing association mie* to 
correlations. Proc. ACM SJGMOD Conf. on Management of Data, 265-276, 1997. 



Closed Set Based Discovery of Small Covers for Association.. - Pasquier, Bastide, Taouil (1999) 
(9 citations) (Correct) 

....according to the user preferences. In contrast, the second trend addresses the problem with an a priori 
vision, by attempting to minimize the number of exhibited rules. In [14, 28] information about 
taxonomies are used to define criteria of interest which apply for pruning redundant rules. In [7, 25], 
statistical m easures such as Pearson s correlation or the chi squared test are used instead of the 
BHWtBKWB measure* 1 .2 Contribution: an Overview The approach presented in this paper belongs to the 
second trend since it aims to extract not all possible rules but a sub set called small cover .... 
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....specified patterns. Information in taxonomies associated with the dataset can also be integrated in the 
process as proposed in [14, 28] for extracting bases for generalized (multi association rules. 
Integrating item constraints and statistical measures, such as described in [5, 22, 29] and [7, 25] 
respectively, in the generation of bases requires further work. Functional and approximate 
dependencies Algorithms presented in this paper can be adapted to generate bases for functional and 
approximate dependencies. In [15, 20] such bases and algorithms for generating them were proposed 

S. Brut, R, Motwani, and C Suversiein. Beyond market baskets: Generalizing association rules to 
correlation. Proc. of the ACM SiiGMOD Conference, pages 265-276, May 1997 
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....those based on the ffflKj wise paradigm; nevertheless, they still require a substantial amount of time. A 
number of efficient and scalable parallel formulations have been developed for finding frequent 
itemsets and sequences that are based on the candidate generation and counting framework [3, 18, 
22, 16. 4], both for shared and distributed memory parallel computers [2, 22, 17, 8, 25, 29, 20] 
However, the problem of parallelizing equivalence class based and projection based algorithms has 
received relatively little attention and existing parallel formulations for them have been targeted 
only toward .... 

....[27] algorithm extended the Apriori like f^fj wise mining method to find frequent patterns in 
sequential datasets. The basic w * se algorithm has been extended in a number of different ways 
leading to more efficient algorithms such as DHP [19, 18] Partition [22] SEAR and Spear [16] and 

DIC [4]. An entirely different approach for finding frequent itemsets and sequences are the equivalence 
class based algorithms Eclat [32] and SPADE [31] that break the large search space of frequent patterns 
into small and independent chunks and use a vertical database format that allows them to .... 

S. Br ku R. Motwani, an.dC Silversteirxx Beyond market baskets: Generalizing association rales to 
correlations, hi Proc. of 1.997 ACM-SJGMOD Int. Conf on Mar.uigemer.it of Da.ia> Tucson. Arizona, 
1997. 



Optimization of Constrained Frequent Set Queries with.. - Lakshmanan, Ng, Hah (1998) (23 citations) 

.(Con:ect) 

group includes studies that go beyond the initial notion of association rules to other kinds of 
mined rules, e.g. multi j||ff] rules [8, 21] quantitative and multi dimensional rules [22, 7, 14, 10] 
rules with item constraints [23] mining long patterns [3] correlations and causal structures |4, 20], 
ratio rules [12] etc. Recently it has been recognized that the integration of data mining technologies with 
database management systems is of crucial importance [5] Furthermore, it has been argued that the 
fundamental distinction of a data mining system from a statistical program or .... 

S. Brin ; It Motw&nl andC. Silverstem. Bevond market 'iiMkei: Genertilizhtz association rules to 
correlations, hi Proc. .1997 ACM-SIGMOIX pp 265-276. 
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....algorithm extended the Apriori like fflfflj wise mining method to find frequent patterns in sequential 
databases. The basic y|||f] wise algorithm has been extended in a number of different ways leading 
to more efficient algorithms such as DHP [14, 13] Partition [19] SEAR and Spear [12] and DIC 

[5|. An entirely different approach for finding frequent itemsets and sequences are the equivalence class 
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based algorithms Eclat [26] and SPADE [24] that break the large search space of frequent patterns into 
small and independent chunks and use a vertical database format that allows them to .... 
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Exploratory Mining and Pruning Optimizations of. - Ng, Lakshmanan, Pang.. (1998) (85 citations) 
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....exceed given thresholds. While such associations are useful, other notio ns of relationships may also be 
useful. First, there exist several significance metrics other than WfllWItflWfflB that are equally meaningful. 
For example, Brin et al. argue why correlation can be more useful in many circumstances [5]. 
Second, there may be separate criteria for selecting candidates for the antecedent and consequent of a 
rule. For example, the user may want to find associations from sets of items to sets of types. The rule 
pepsi snacks is an instance of such an association, meaning that customers often buy the .... 

....Phase II, the user can specify the desired significance metric, and can give different conditions that 
must be satisfied by the antecedent and consequent of the relationships to be formed. There are already 
several proposals in the literature that make the notion of associations less rigid [5, 7, 9, 12* 14, 
21). We are not proposing another here. Instead, we are proposing an architecture that allows many of 
those alternative notions to co exist, and that permits the user to choose whatever is appropriate for the 
application. 2 Architecture Figure 1 shows a two phase architecture for exploratory .... 
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. . . .X Y where Xand Y are su bsets of R. T he most popular interestingness measure for an association 
rule X Y is its accuracy (or fiCTMllTO&h which is defined as acc(X Y, d) fr(X, d) Also several other 
classes of patterns and measures of interestingness have been studied (see e.g. |4, 6, 9, 13, 14, 15, 
27, 32, 33, 36, 37, 43, 44, 47, 48]) It is not always easy to define an interestingness measure # in 
such a way that there would be a threshold value # such that #(p) # for almost all interesting 
patterns p and for only very few uninteresting ones. One way to augment the interestingness measure 
is to define additional .... 
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is t Gammam 1 X j=l N ( Pi; W (j; m) where N ( Pi; W (j; m) is the number of object histories which 
follow Pi on window W (j; m) 3. 1 .2 Strength Different methods can be used to capture the degree of 
nonindependence. In this paper, we use a metric that is similar to interest defined in [4] to measure 
the strength of a temporal association rule. Definition 3.3 Given a temporal association rule R : X ( Y 
and a sequence of : S 1 ; S 2 ; S t , the strength of the rule is Support(X Y; Omega Gamma Support(X; 
Omega Gamma ThetaSupport(Y; Omega Gamma .3.1.3 Density Since .... 
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....(i.e. 1) towards its significance, regardless of its likelihood of occurrence. Intuitively, the assessment 
of significance of a pattern in a sequence should take into account the expectation of pattern 
occurrence (according to some prior knowledge) Recently, many research has been proposed 1 1, 3, 

5, 6, 8, 9, 10, .1.1, 12, 15] towards this objective. We will furnish an overview in the next section. In this 
paper, a new model is proposed to characterize the class of so called surprising patterns (instead of 
frequent patterns) We will see that our model not only has a solid theoretical foundation but also allows 
an .... 

....not fully taken into account by the multiple support model. In contrast, the information gain metric 
proposed in this paper would capture the difference of occurrences between B and C. Mining patterns 
that are statistically significant (rather than frequent) becomes a popular topic. Brin et al. [3 1 first 
introduced the concept of correlation and it was shown that in many applications the correlation 
measurement can reveal some very important patterns. The Chi squared test was used to test the 
correlation among items. Instead of explicitly enumerating all correlated itemsets, the border .... 

[Article contains additional citation context not shown here] 

S. Brin, R. Mot warn, €, Silverstem. Bevowi m(Met baskets: generalizing association rules to 
correlations. Proa ACM SEGMOD Conf. on Managers! ofData, 265-216, 1997. 
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....respectively, to the minimal infrequent and maximal frequent sets for D. The generation of (maximal) 
frequent sets of a given binary ||§ll|| is an important task of knowledge discovery and data 
mining, e.g. it is used for mining association rules [7, 31, 52, 53, 56, 57, 70] correlations [20], 
sequential patterns [2] episodes [54] emerging patterns [25] and appears in many other 
applications. Most practical procedures to generate frequent sets are based on the anti monotone Apriori 
heuristic (see [1] and build frequent sets in a bottom up way, running in time proportional to the .... 
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....find cardinalities of subgroups or significance of deviations, etc. Typically, the patterns, whose 
frequencies are needed, are conjunctions of atomic patterns. A prime example is given by the frequent 
set concept underlying association rules [2, 3] Moreover, the patterns defined for correlation [6, 7], 
causality [18] sequential patterns [4] episodes [13] constrained frequent sets [11, 14, 19] long 
patterns [1, 5] closed sets [16] and many other important data mining tasks have the same basic 
form. In all these cases, we have instances of the following abstract problem. Given a collection .... 
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....the frequent itemsets and their frequencies are needed. This is an important point, and we consider 
in the rest of this paper that the generation of association rules does not need to access the 
transactional database (it is still the case when using other objective measures such as the 



association rule constraint C, let us study di erent strategies to support constrained association rule 
mining .... 
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....a counter is set up and the algorithm then passes over the complete database of transactions. 
Whenever a transactions contains one of the candidates its counter is incremented. Eciently looking up 
candidates in transactions requires specialized data structures, e.g. hashtrees or pre x trees, c.f. [3, 6]. 
Alternatively the support values of candidates can be determined indirectly by set intersections. For that 
purpose so called transaction sets are employed. The transaction set X:tids of an itemset X is de ned as 
the set of all transactions this itemset is contained in: X:tids = fT 2 D j X .... 

....Partition [26] combines the breadth rst search of Apriori with determining the support values of the 
candidates indirectly by set intersections. In order to be able to keep all necessary transaction sets 
comfortably in main memory the database typically needs to be partitioned. The algorithm DIC [6] 
enhances Apriori by relaxing the strict separation between candidate generation and counting the 
candidates* Already during passing over the transactions new candidates are generated and added to the 
set of candidates on the y. This helps to signi cantly reduce the total number of the .... 
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....respectively, to the minimal infrequent and maximal frequent sets for D. The generation of (maximal) 



frequent sets of a given binary lllilll is an important task of knowledge discovery and data 
mining, e.g. it is used for mining association rules [7, 32, 53, 54, 58, 59, 74] correlations |20j, 
sequential patterns [2] episodes [55] emerging patterns [25] and appears in many other 
applications. Most practical procedures to generate frequent sets are based on the anti monotone Apriori 
heuristic (see [1] and build frequent sets in a bottom up 19 way, running in time proportional to .... 
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